Method for real-time video processing involving changing features of an object in the video

ABSTRACT

A method for real-time video processing for changing features of an object in a video, the method comprises: providing an object in the video, the object being at least partially and at least occasionally presented in frames of the video; detecting the object in the video; generating a list of at least one element of the object, the list being based on the object&#39;s features to be changed according to a request for modification; detecting the at least one element of the object in the video; 
     tracking the at least one element of the object in the video; and transforming the frames of the video such that the at least one element of the object is modified according to the request for modification.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims the benefit of U.S.patent application Ser. No. 15/921,282, filed on Mar. 14, 2018, which isa continuation of, and claims the benefit of U.S. patent applicationSer. No. 14/314,324, filed on Jun. 25, 2014, which claims the benefit ofU.S. Provisional Application Ser. No. 61/936,016, filed on Feb. 5, 2014.

BACKGROUND OF THE INVENTION Technical Field

The disclosed embodiments relate generally to the field of real-timevideo processing. In particular, the disclosed embodiments relate to acomputerized system and a method for real-time video processing thatinvolves changing features of an object in a video.

Description of the Related Art

At the present time some programs can provide processing of stillimages. For example, U.S. Patent Application Publication No.US2007268312, incorporated herein by reference, discloses a method ofreplacing face elements by some components that is made by users forreal-time video. However, it is not possible to process real time videoin such a way that an object shown in real time video can be modified inreal time naturally with some effects. In case of a human's face sucheffects can include making a face younger/older and etc.

Thus, new and improved systems and methods are needed that would enablereal time video stream processing that involves changing features of anobject in the video stream.

SUMMARY OF INVENTION

The embodiments described herein are directed to systems and methodsthat substantially obviate one or more of the above and other problemsassociated with the conventional technology for real time video streamprocessing. In accordance with one aspect of the embodiments describedherein, there is provided a method for real-time video processing forchanging features of an object in a video, the method comprises:providing an object in the video, the object being at least partiallyand at least occasionally presented in frames of the video; detectingthe object in the video; generating a list of at least one element ofthe object, the list being based on the object's features to be changedaccording to a request for modification; detecting the at least oneelement of the object in the video; tracking the at least one element ofthe object in the video; and transforming the frames of the video suchthat the at least one element of the object is modified according to therequest for modification.

In one or more embodiments, transforming the frames of the videocomprises: calculating characteristic points for each of the at leastone element of the object; generating a mesh based on the calculatedcharacteristic points for each of the at least one element of theobject; tracking the at least one element of the object in the video,wherein tracking comprises aligning the mesh for each of the at leastone element with a position of the corresponding each of the at leastone element from frame to frame; generating a set of first points on themesh for each of the at least one element of the object based on therequest for modification; generating a set of second points on the meshbased on the set of first points and the request for modification; andtransforming the frames of the video such that the at least one elementof the object is modified, wherein, for each of the at least one elementof the object, the set of first points comes into the set of secondpoints using the mesh

In one or more embodiments, the computer-implemented method furthercomprises: generating a square grid associated with the background ofthe object in the video; and transforming the background of the objectusing the square grid in accordance with the modification of the atleast one element of the object.

In one or more embodiments, the computer-implemented method furthercomprises: generating at least one square grid associated with regionsof the object that are adjacent to the modified at least one element ofthe object; and modifying the regions of the object that are adjacent tothe modified at least one element of the object in accordance with themodification of the at least one element of the object using the atleast one square grid.

In one or more embodiments, the detecting of the object in the video isimplemented with the use of Viola-Jones method.

In one or more embodiments, calculating of the object's characteristicpoints is implemented with the use of an Active Shape Model (ASM).

In one or more embodiments, transforming the frames of the videocomprises: calculating characteristic points for each of the at leastone element of the object; generating a mesh based on the calculatedcharacteristic points for each of the at least one element of theobject; generating a set of first points on the mesh for each of the atleast one element of the object based on the request for modification;generating at least one area based on the set of first points for eachof the at least one element of the object; tracking the at least oneelement of the object in the video, wherein the tracking comprisesaligning the at least one area of each of the at least one element witha position of the corresponding each of the at least one element fromframe to frame; transforming the frames of the video such that theproperties of the at least one area are modified based on the requestfor modification.

In one or more embodiments, modification of the properties of the atleast one area includes changing color of the at least one area.

In one or more embodiments, modification of the properties of the atleast one area includes removing at least part of the at least one areafrom the frames of the video.

In one or more embodiments, modification of the properties of at leastone area includes adding at least one new object to the at least onearea, the at least one new object is based on the request formodification.

In one or more embodiments, objects to be modified include a human'sface.

In one or more embodiments, the processed video comprises a videostream.

In accordance with another aspect of the embodiments described herein,there is provided a mobile computerized system comprising a centralprocessing unit and a memory, the memory storing instructions for:providing an object in the video, the object being at least partiallyand at least occasionally presented in frames of the video; detectingthe object in the video; generating a list of at least one element ofthe object, the list being based on the object's features to be changedaccording to a request for modification; detecting the at least oneelement of the object in the video; tracking the at least one element ofthe object in the video; and transforming the frames of the video suchthat the at least one element of the object is modified according to therequest for modification.

In one or more embodiments, transforming the frames of the videocomprises: calculating characteristic points for each of the at leastone element of the object; generating a mesh based on the calculatedcharacteristic points for each of the at least one element of theobject; tracking the at least one element of the object in the video,wherein tracking comprises aligning the mesh for each of the at leastone element with a position of the corresponding each of the at leastone element from frame to frame; generating a set of first points on themesh for each of the at least one element of the object based on therequest for modification;

generating a set of second points on the mesh based on the set of firstpoints and the request for modification; and transforming the frames ofthe video such that the at least one element of the object is modified,wherein, for each of the at least one element of the object, the set offirst points comes into the set of second points using the mesh.

In one or more embodiments, the computer-implemented method furthercomprises: generating a square grid associated with the background ofthe object in the video; and transforming the background of the objectusing the square grid in accordance with the modification of the atleast one element of the object.

In one or more embodiments, the computer-implemented method furthercomprises: generating at least one square grid associated with regionsof the object that are adjacent to the modified at least one element ofthe object; and modifying the regions of the object that are adjacent tothe modified at least one element of the object in accordance with themodification of the at least one element of the object using the atleast one square grid.

In one or more embodiments, the detecting of the object in the video isimplemented with the use of Viola-Jones method.

In one or more embodiments, calculating of the object's characteristicpoints is implemented with the use of an Active Shape Model (ASM).

In one or more embodiments, transforming the frames of the videocomprises: calculating characteristic points for each of the at leastone element of the object; generating a mesh based on the calculatedcharacteristic points for each of the at least one element of theobject; generating a set of first points on the mesh for each of the atleast one element of the object based on the request for modification;generating at least one area based on the set of first points for eachof the at least one element of the object; tracking the at least oneelement of the object in the video, wherein the tracking comprisesaligning the at least one area of each of the at least one element witha position of the corresponding each of the at least one element fromframe to frame; transforming the frames of the video such that theproperties of the at least one area are modified based on the requestfor modification.

In one or more embodiments, modification of the properties of the atleast one area includes changing color of the at least one area.

In one or more embodiments, modification of the properties of the atleast one area includes removing at least part of the at least one areafrom the frames of the video.

In one or more embodiments, modification of the properties of at leastone area includes adding at least one new object to the at least onearea, the at least one new object is based on the request formodification.

In one or more embodiments, objects to be modified include a human'sface.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification exemplify the embodiments of the presentinvention and, together with the description, serve to explain andillustrate principles of the inventive technique. Specifically:

FIG. 1 illustrates a method for real-time video processing for changingfeatures of an object in a video according to the first embodiment ofthe invention.

FIG. 2 illustrates facial feature reference points detected by an ASMalgorithm used in the method according to one embodiment of the presentinvention.

FIG. 3 illustrates Candide-3 model used in the method according to oneembodiment of the present invention.

FIG. 4a and FIG. 4b show an example of a mean face (a) and an example ofa current observation.

FIG. 5 illustrates Candide at a frame used in the method according toone embodiment of the present invention.

FIG. 6 illustrates an exemplary embodiment of a computer platform basedon which the techniques described herein may be implemented.

DETAILED DESCRIPTION

In the following detailed description, reference will be made to theaccompanying drawing(s), in which identical functional elements aredesignated with like numerals. The aforementioned accompanying drawingsshow by way of illustration and not by way of limitation, specificembodiments and implementations consistent with principles of thepresent invention. These implementations are described in sufficientdetail to enable those skilled in the art to practice the invention andit is to be understood that other implementations may be utilized andthat structural changes and/or substitutions of various elements may bemade without departing from the scope and spirit of present invention.The following detailed description is, therefore, not to be construed ina limited sense. Additionally, the various embodiments of the inventionas described may be implemented in the form of a software running on ageneral purpose computer, in the form of a specialized hardware, orcombination of software and hardware.

It will be appreciated that the method for real time video processingcan be performed with any kind of video data, e.g. video streams, videofiles saved in a memory of a computerized system of any kind (such asmobile computer devices, desktop computer devices and others), and allother possible types of video data understandable for those skilled inthe art. Any kind of video data can be processed, and the embodimentsdisclosed herein are not intended to be limiting the scope of thepresent invention by indicating a certain type of video data.

The embodiments disclosed further are aimed for processing of videostreams, however all other types of video data including video filessaved in a memory of a computerized system can be processed by themethods of the present invention. For example, a user can load videofiles and save them in a memory of his computerized system and suchvideo files can be also processed by the methods of the presentinvention. In accordance with one aspect of the embodiments describedherein, there is provided a computerized system and acomputer-implemented method for processing a real-time video stream thatinvolves changing features of an object in the video stream. Thedescribed method may be implemented using any kind of computing deviceincluding desktops, laptops, tablet computers, mobile phones, musicplayers, multimedia players etc. having any kind of generally usedoperational system such as Windows®, iOS®, Android® and others. Alldisclosed embodiments and examples are non-limiting to the invention anddisclosed for illustrative purposes only.

It is important to note that any objects can be processed by theembodiments of the described method, including, without limitation, suchobjects as a human's face and parts of a human body, animals, and otherliving creatures or non-living things which images can be transported ina real-time video stream.

The method 100 according to the first embodiment of the invention isillustrated in FIG. 1. The method 100 is preferably used for an objectin a video stream that at least partially and at least occasionallypresented in frames of the video stream. In other words, the method 100is applicable for those objects that are not presented in frames of avideo stream all the time. According to the method 100 a request formodification of the object including changing its features is received(stage 110). The mentioned request can be issued by a user having arelation to a video stream, a system enabling a process of the videostream, or by any other source.

Next, the object from the request for modification is detected in thevideo stream (stage 120), for example 5 with the use of the conventionalViola-Jones method, and the request for modification is analyzed bygenerating a list having one or more elements of the object (stage 130)such that the mentioned list is based on the object's features that mustbe changed according to the request.

Further, in one or more embodiments, the elements of the object aredetected (stage 140) and tracked (stage 150) in the video stream.

Finally, in one or more embodiments, the elements of the object aremodified according to the request for modification, thus transformingthe frames of the video stream (stage 160).

It shall be noted that transformation of frames of a video stream can beperformed by different methods for different kinds of transformation.For example, for transformations of frames mostly referring to changingforms of object's elements the second embodiment of the invention can beused. According to the second embodiment, primarily characteristicpoints for each of element of an object are calculated. Hereinaftercharacteristic points refer to points of an object which relate to itselements used in changing features of this object. It is possible tocalculate characteristic points with the use of an Active Shape Model(ASM) or other known methods. Then, a mesh based on the characteristicpoints is generated for each of the at least one element of the object.This mesh used in the following stage of tracking the elements of theobject in the video stream. In particular in the process of tracking thementioned mesh for each element is aligned with a position of eachelement. Further, two sets additional points are generated on the mesh,namely a set of first points and a set of second points. The set offirst points is generated for each element based on a request formodification, and the set of second points is generated for each elementbased on the set of first points and the request for modification. Then,the frames of the video stream can be transformed by modifying theelements of the object on the basis of the sets of first and secondpoints and the mesh.

In such method a background of the modified object can be changed ordistorted. Thus, to prevent such effect it is possible to generate asquare grid associated with the background of the object and totransform the background of the object based on modifications ofelements of the object using the square grid.

As it can be understood by those skilled in the art, not only thebackground of the object but also some its regions adjacent to themodified elements can be changed or distorted. Here, one or severalsquare grids associated with the mentioned regions of the object can begenerated, and the regions can be modified in accordance with themodification of elements of the object by using the generated squaregrid or several square grids.

In one or more embodiments, transformations of frames referring tochanging some areas of an object using its elements can be performed bythe third embodiment of the invention that is similar to the thirdembodiment. More specifically, transformation of frames according to thethird embodiment begins with calculating of characteristic points foreach element of an object and generating a mesh based on the calculatedcharacteristic points. After that a set of first points is generated onthe mesh for each element of the object on the basis of a request formodification. Then one or more areas based on the set of first pointsare generated for each element. Finally, the elements of the object aretracked by aligning the area for each element with a position for eachof the at least one element, and properties of the areas can be modifiedbased on the request for modification, thus transforming the frames ofthe video stream.

According to the nature of the request for modification properties ofthe mentioned areas can be transformed in different ways:

-   -   changing color of areas;    -   removing at least some part of areas from the frames of the        video stream;    -   including one or more new objects into areas which are based on        a request for modification.

It should be noted that different areas or different parts of such areascan be modified in different ways as mentioned above, and properties ofthe mentioned areas can be also modified in a different manner otherthen the specific exemplary modifications described above and apparentfor those skilled in the art.

Face detection and face tracking are discussed below in greater detail.

Face Detection and Initialization

In one or more embodiments, first, in the algorithm for changingproportion a user sends a request for changing proportions of an objectin a video stream. The next step in the algorithm involves detecting theobject in the video stream.

In one or more embodiments, the face is detected on an image with use ofViola-Jones method. Viola-Jones method is a fast and quite accuratemethod used to detect the face region. Then, an Active Shape Model (ASM)algorithm is applied to the face region of an image to detect facialfeature reference points. However, it should be appreciated that othermethods and algorithms suitable for face detection can be used.

In one or more embodiments, for locating facial features locating oflandmarks is used. A landmark represents a distinguishable point presentin most of the images under consideration, for example, the location ofthe left eye pupil (FIG. 2).

In one or more embodiments, a set of landmarks forms a shape. Shapes arerepresented as vectors: all the x- followed by all the y-coordinates ofthe points in the shape. One shape is aligned to another with asimilarity transform (allowing translation, scaling, and rotation) thatminimizes the average Euclidean distance between shape points. The meanshape is the mean of the aligned training shapes (which in the presentdisclosure are manually landmarked faces).

In one or more embodiments, subsequently, in accordance with the ASMalgorithm, the search for landmarks from the mean shape aligned to theposition and size of the face determined by a global face detector isstarted. It then repeats the following two steps until convergence (i)suggest a tentative shape by adjusting the locations of shape points bytemplate matching of the image texture around each point (ii) conformthe tentative shape to a global shape model. The individual templatematches are unreliable and the shape model pools the results of the weaktemplate matchers to form a stronger overall classifier. The entiresearch is repeated at each level in an image pyramid, from coarse tofine resolution. It follows that two types of sub-model make up the ASM:the profile model and the shape model.

In one or more embodiments, the profile models (one for each landmark ateach pyramid level) are used to locate the approximate position of eachlandmark by template matching. Any template matcher can be used, but theclassical ASM forms a fixed-length normalized gradient vector (calledthe profile) by sampling the image along a line (called the whisker)orthogonal to the shape boundary at the landmark. During training onmanually landmarked faces, at each landmark the mean profile vector gand the profile covariance matrix Sg are calculated. During searching,the landmark along the whisker to the pixel whose profile g has lowestMahalanobis distance from the mean profile g is displaced, where theMahalanobisDistance=(g−g )^(T) S _(g) ⁻¹(g−g ) MahalanobisDistance=(g−g)^(T) S _(g) ⁻¹(g−g )  (1)

The shape model specifies allowable constellations of landmarks. Itgenerates a shape {circumflex over (x)} with{circumflex over (x)}=x +_b  (2)where {circumflex over (x)} is the mean shape, is a parameter vector,and _ is a matrix of selected eigenvectors of the covariance matrix Sgof the points of the aligned training shapes. Using a standard principalcomponents approach, model has as much variation in the training set asit is desired by ordering the eigenvalues λi of Ss and keeping anappropriate number of the corresponding eigenvectors in Φ. In themethod, a single shape model for the entire ASM is used but it is scaledfor each pyramid level.

Then the Equation 2 is used to generate various shapes by varying thevector parameter b. By keeping the elements of b within limits(determined during model building) it is possible to ensure thatgenerated face shapes are lifelike.

Conversely, given a suggested shape x, it is possible to calculate theparameter b that allows Equation 2 to best approximate x with a modelshape x{circumflex over ( )}. An iterative algorithm, described byCootes and Taylor, that gives the b and T that minimizesdistance(x,T( x +_b))  (3)where T is a similarity transform that maps the model space into theimage space is used.

In one or more embodiments, mapping can be built from facial featurereference points, detected by ASM, to Candide-3 point, and that gives usCandide-3 points x and y coordinates. Candide is a parameterised facemask specifically developed for model-based coding of human faces. Itslow number of polygons (approximately 100) allows fast reconstructionwith moderate computing power. Candide is controlled by global and localAction Units (AUs). The global ones correspond to rotations around threeaxes. The local Action Units control the mimics of the face so thatdifferent expressions can be obtained.

The following equation system can be made, knowing Candide-3 points xand y coordinates.

$\begin{matrix}{{{\sum\limits_{j = 1}^{m}{\; X_{ij}*B_{j}}} = x_{i}},} & (4) \\{{{\sum\limits_{j = 1}^{m}{\; Y_{ij}*B_{j}}} = y_{i}},} & (5)\end{matrix}$where—j-th shape unit, xi, yi—i-th point coordinates, Xjj,Yij—coefficients, which denote how the i-th point coordinates arechanged by j-th shape unit. In this case, this system is overdetermined,so it can be solved precisely. Thus, the following minimization is made:

$\begin{matrix}\left. {\left( {{\sum\limits_{j = 1}^{m}{\; X_{ij}*B_{j}}} = x_{i}} \right)^{2} + \left( {{\sum\limits_{j = 1}^{m}{\; Y_{ij}*B_{j}}} = y_{i}} \right)^{2}}\rightarrow\left. {{\min\left( {{\sum\limits_{j = 1}^{m}{\; X_{ij}*B_{j}}} = x_{i}} \right)}^{2} + \left( {{\sum\limits_{j = 1}^{m}{\; Y_{ij}*B_{j}}} = y_{i}} \right)^{2}}\rightarrow{\min\;{.}} \right. \right. & (6)\end{matrix}$

Let's denoteX=((X _(ij))^(T),(Y _(ij))^(T))^(T) , x=((x _(i))^(T),(y _(i))^(T))^(T), B=(B _(j))^(T).  (7)This equation system is linear, so it's solution isB=(X ^(T) X)⁻¹ X ^(T) x  (8)

In one or more embodiments, it is also possible to use Viola-Jonesmethod and ASM to improve tracking quality. Face tracking methodsusually accumulate error over time, so they can lose face position afterseveral hundred frames. In order to prevent it, in the present inventionthe ASM algorithm is run from time to time to re-initialize trackingalgorithm.

Face Tracking

In one or more embodiments, the next step comprises tracking thedetected object in the video stream. In the present invention theabovementioned Candide-3 model is used (see Ahlberg, J.: Candide-3, anupdated parameterized face. Technical report, Linköping University,Sweden (2001)), incorporated herein by reference, for tracking face in avideo stream). The mesh or mask corresponding to Candide-b 3 model isshown in FIG. 3.

In one or more embodiments, a state of the model can be described byshape units intensity vector, action units intensity vector and aposition-vector. Shape units are some main parameters of a head and aface. In the present invention next 10 units are used:

-   -   Eyebrows vertical position    -   Eyes vertical position    -   Eyes width    -   Eyes height    -   Eye separation distance    -   Nose vertical position    -   Nose pointing up    -   Mouth vertical position    -   Mouth width    -   Chin width

In one or more embodiments, action units are face parameters thatcorrespond to some face movement. In the present invention next 7 unitsare used:

-   -   Upper lip raiser    -   Jaw drop    -   Up stretcher    -   Left brow lowerer    -   Bight brow lowerer    -   Lip corner depressor    -   Outer brow raiser

In one or more embodiments, the mask position at a picture can bedescribed by using 6 coordinates: yaw, pitch, roll, x, y, scale. Themain idea of the algorithm proposed by Dornaika et al. (Dornaika, F.,Davoine, F.: On appearance based face and facial action tracking. IEEETrans. Circuits Syst. Video Technol. 16(9):1107-1124 (2006),incorporated herein by reference) is to find the mask position, whichobserves the region most likely to be a face. For each position it ispossible to calculate the observation error—the value that indicates thedifference between image under current mask position and the mean face.An example of the mean face and of the observation under currentposition is illustrated in FIGS. 4(a)-3(b). FIG. 4(b) corresponds to theobservation under the mask shown in FIG. 5.

In one or more embodiments, human face is modeled as a picture with afixed size (width=40 px, height=46 px) called a mean face. Gaussiandistribution that is proposed in the original algorithms has shown worseresult in comparison with the static image. So, a difference between thecurrent observation and the mean face is calculated in the followingway:e(b)=Σ(log(1+I _(m))−log(1+I _(i)))²  (9)

Logarithm function makes tracking more stable.

In one or more embodiments, to minimize error, Taylor series is used asit was proposed by Dornaika at. el. (see Dornaika, F., Davoine, F.: Onappearance based face and facial action tracking. IEEE Trans. CircuitsSyst. Video Technol. 16(9):1107-1124 (2006)). It was found that it isnot necessary to sum up a number of finite difference when calculatingan approximation to first derivative. Derivative is calculated in thefollowing way:

$\begin{matrix}{g_{ij} = \frac{{W\left( {y_{t},{b_{t} + {\_ b}_{t}}} \right)}_{ij} - {W\left( {y_{t},{b_{t} - {\_ b}_{t}}} \right)}_{ij}}{- j}} & (10)\end{matrix}$

Here g_(ij) is an element of matrix G. This matrix has size m*n, where mis large enough (about 1600) and n is small (about 14). In case ofstraight-forward calculating there have to be done n*m operations ofdivision. To reduce the number of divisions this matrix can be rewrittenas a product of two matrices:G=A*B

Where matrix A has the same size as G and its element is:a _(ij) =W(y _(t) ,b _(t)+_b _(t))_(ij) −W(y _(t) ,b _(t)−_b_(t))_(ij)  (11)and matrix B is a diagonal matrix with sizes n*n, and b_(ii)=_i⁻¹

Now Matrix G_(t) ⁺ has to be obtained and here is a place where a numberof divisions can be reduced.G _(t) ⁺=(G ^(T) G)⁻¹ G ^(T)=(B ^(T) A ^(T) AB)⁻¹ B ^(T) A ^(T) =B ⁻(A^(T) A)⁻¹ B ⁻¹ BA ^(T) =B ⁻¹(A ^(T) A)⁻¹ A ^(T)  (12)

After that transformation this can be done with n*n divisions instead ofm*n.

One more optimization was used here. If matrix is created and thenmultiplied to _{dot over (o)}b_(t), it leads to n*m+n³ operations, butif first A^(T) and _{dot over (o)}b_(t) are multiplied and thenB⁻¹(A^(T)A)⁻¹ with it, there will be only n*m+n³ operations, that ismuch better because n<<m.

Thus, the step of tracking the detected object in the video stream inthe present embodiment comprises creating a mesh that is based on thedetected feature reference points of the object and aligning the mesh tothe object on each frame.

It should be also noted that to increase tracking speed in the presentinvention multiplication of matrices is performed in such a way that itcan be boosted using ARM advanced SIMD extensions (also known as NEON).Also, the GPU is used instead of CPU whenever possible. To get highperformance of the GPU, operations in the present invention are groupedin a special way.

Thus, tracking according to the exemplary embodiment of the inventionhas the following advantageous features:

1. Before tracking, Logarithm is applied to the grayscale value of eachpixel to track it. This transformation has a great impact to trackingperformance.

2. In the procedure of gradient matrix creation, the step of eachparameter depends on the scale of the mask.

Exemplary Computer Platform

FIG. 6 is a block diagram that illustrates an embodiment of a computersystem 500 upon which various embodiments of the inventive conceptsdescribed herein may be implemented. The system 500 includes a computerplatform 501, peripheral devices 502 and network resources 503.

The computer platform 501 may include a data bus 504 or othercommunication mechanism for communicating information across and amongvarious parts of the computer platform 501, and a processor 505 coupledwith bus 504 for processing information and performing othercomputational and control tasks. Computer platform 501 also includes avolatile storage 506, such as a random access memory (RAM) or otherdynamic storage device, coupled to bus 504 for storing variousinformation as well as instructions to be executed by processor 505,including the software application for implementing multifunctionalinteraction with elements of a list using touch-sensitive devicesdescribed above. The volatile storage 506 also may be used for storingtemporary variables or other intermediate information during executionof instructions by processor 505. Computer platform 501 may furtherinclude a read only memory (ROM or EPROM) 507 or other static storagedevice coupled to bus 504 for storing static information andinstructions for processor 505, such as basic input-output system(BIOS), as well as various system configuration parameters. A persistentstorage device 508, such as a magnetic disk, optical disk, orsolid-state flash memory device is provided and coupled to bus 504 forstoring information and instructions.

Computer platform 501 may be coupled via bus 504 to a touch-sensitivedisplay 509, such as a cathode ray tube (CRT), plasma display, or aliquid crystal display (LCD), for displaying information to a systemadministrator or user of the computer platform 501. An input device 510,including alphanumeric and other keys, is coupled to bus 504 forcommunicating information and command selections to processor 505.Another type of user input device is cursor control device 511, such asa mouse, a trackball, or cursor direction keys for communicatingdirection information and command selections to processor 505 and forcontrolling cursor movement on touch-sensitive display 509. This inputdevice typically has two degrees of freedom in two axes, a first axis(e.g., x) and a second axis (e.g., y), that allows the device to specifypositions in a plane. To detect user's gestures, the display 509 mayincorporate a touchscreen interface configured to detect user's tactileevents and send information on the detected events to the processor 505via the bus 504.

An external storage device 512 may be coupled to the computer platform501 via bus 504 to provide an extra or removable storage capacity forthe computer platform 501. In an embodiment of the computer system 500,the external removable storage device 512 may be used to facilitateexchange of data with other computer systems.

The invention is related to the use of computer system 500 forimplementing the techniques described herein. In an embodiment, theinventive system may reside on a machine such as computer platform 501.According to one embodiment of the invention, the techniques describedherein are performed by computer system 500 in response to processor 505executing one or more sequences of one or more instructions contained inthe volatile memory 506. Such instructions may be read into volatilememory 506 from another computer-readable medium, such as persistentstorage device 508. Execution of the sequences of instructions containedin the volatile memory 506 causes processor 505 to perform the processsteps described herein. In alternative embodiments, hard-wired circuitrymay be used in place of or in combination with software instructions toimplement the invention. Thus, embodiments of the invention are notlimited to any specific combination of hardware circuitry and software.

The term “computer-readable medium” as used herein refers to any mediumthat participates in providing instructions to processor 505 forexecution. The computer-readable medium is just one example of amachine-readable medium, which may carry instructions for implementingany of the methods and/or techniques described herein. Such a medium maytake many forms, including but not limited to, non-volatile media andvolatile media. Non-volatile media includes, for example, optical ormagnetic disks, such as the persistent storage device 508. Volatilemedia includes dynamic memory, such as volatile storage 506.

Common forms of computer-readable media include, for example, a floppydisk, a flexible disk, hard disk, magnetic tape, or any other magneticmedium, a CD-ROM, any other optical medium, punchcards, papertape, anyother physical medium with patterns of holes, a RAM, a PROM, an EPROM, aFLASH-EPROM, a flash drive, a memory card, any other memory chip orcartridge, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to processor 505 forexecution. For example, the instructions may initially be carried on amagnetic disk from a remote computer. Alternatively, a remote computercan load the instructions into its dynamic memory and send theinstructions over a telephone line using a modem. A modem local tocomputer system can receive the data on the telephone line and use aninfra-red transmitter to convert the data to an infra-red signal. Aninfra-red detector can receive the data carried in the infra-red signaland appropriate circuitry can place the data on the data bus 504. Thebus 504 carries the data to the volatile storage 506, from whichprocessor 505 retrieves and executes the instructions. The instructionsreceived by the volatile memory 506 may optionally be stored onpersistent storage device 508 either before or after execution byprocessor 505. The instructions may also be downloaded into the computerplatform 501 via Internet using a variety of network data communicationprotocols well known in the art.

The computer platform 501 also includes a communication interface, suchas network interface card 513 coupled to the data bus 504. Communicationinterface 513 provides a two-way data communication coupling to anetwork link 514 that is coupled to a local network 515. For example,communication interface 513 may be an integrated services digitalnetwork (ISDN) card or a modem to provide a data communicationconnection to a corresponding type of telephone line. As anotherexample, communication interface 513 may be a local area networkinterface card (LAN NIC) to provide a data communication connection to acompatible LAN. Wireless links, such as well-known 802.11a, 802.11b,802.11g and Bluetooth may also used for network implementation. In anysuch implementation, communication interface 513 sends and receiveselectrical, electromagnetic or optical signals that carry digital datastreams representing various types of information.

Network link 514 typically provides data communication through one ormore networks to other network resources. For example, network link 514may provide a connection through local network 515 to a host computer516, or a network storage/server 522. Additionally or alternatively, thenetwork link 514 may connect through gateway/firewall 517 to thewide-area or global network 518, such as an Internet. Thus, the computerplatform 501 can access network resources located anywhere on theInternet 518, such as a remote network storage/server 519. On the otherhand, the computer platform 501 may also be accessed by clients locatedanywhere on the local area network 515 and/or the Internet 518. Thenetwork clients 520 and 521 may themselves be implemented based on thecomputer platform similar to the platform 501.

Local network 515 and the Internet 518 both use electrical,electromagnetic or optical signals that carry digital data streams. Thesignals through the various networks and the signals on network link 514and through communication interface 513, which carry the digital data toand from computer platform 501, are exemplary forms of carrier wavestransporting the information.

Computer platform 501 can send messages and receive data, includingprogram code, through the variety of network(s) including Internet 518and LAN 515, network link 515 and communication interface 513. In theInternet example, when the system 501 acts as a network server, it mighttransmit a requested code or data for an application program running onclient(s) 520 and/or 521 through the Internet 518, gateway/firewall 517,local area network 515 and communication interface 513. Similarly, itmay receive code from other network resources.

The received code may be executed by processor 505 as it is received,and/or stored in persistent or volatile storage devices 508 and 506,respectively, or other non-volatile storage for later execution.

Finally, it should be understood that processes and techniques describedherein are not inherently related to any particular apparatus and may beimplemented by any suitable combination of components. Further, varioustypes of general purpose devices may be used in accordance with theteachings described herein. It may also prove advantageous to constructspecialized apparatus to perform the method steps described herein. Thepresent invention has been described in relation to particular examples,which are intended in all respects to be illustrative rather thanrestrictive. Those skilled in the art will appreciate that manydifferent combinations of hardware, software, and firmware will besuitable for practicing the present invention. For example, thedescribed software may be implemented in a wide variety of programmingor scripting languages, such as Assembler, C/C++, Objective-C, per,shell, PHP, Java, as well as any now known or later developedprogramming or scripting language.

Moreover, other implementations of the invention will be apparent tothose skilled in the art from consideration of the specification andpractice of the invention disclosed herein. Various aspects and/orcomponents of the described embodiments may be used singly or in anycombination in the systems and methods for real time video streamprocessing. It is intended that the specification and examples beconsidered as exemplary only, with a true scope and spirit of theinvention being indicated by the following claims.

What is claimed is:
 1. A method, comprising: detecting at least aportion of an object in frames of a video; obtaining a mean face basedon a picture with a fixed size, the mean face being related to theobject; obtaining a current observation of the object using the video;computing an observation error based on a square of a difference betweena logarithm of a function of the current observation of the object andthe logarithm of the function of the mean face with the fixed size, theportion of the object being detected based on the observation error;transforming a feature of the portion of the object within the frames ofthe video to generate modified frames with a modified feature, thefeature associated with an element of the portion of the object and thefeature being transformed in the modified frames within the video inwhich the portion of the object is detected while the video is providedat a computing device and the portion of the object is detected in thevideo; and providing the modified frames including the modified feature.2. The method of claim 1, wherein transforming the feature comprises:generating a mesh based on one or more characteristic points of theportion of the object; and transforming the feature based on the meshand the one or more characteristic points of the portion of the object.3. The method of claim 2, wherein generating the mesh comprises:generating a first set of points on the mesh for characteristic pointsassociated with the element of the portion of the object; generating asecond set of points on the mesh based on the set of first points and amodification to be applied in generating the modified feature; andtransforming the frames of the video based on the second set of pointson the mesh.
 4. The method of claim 3, further comprising: receiving arequest for modification representing a modification to be applied tothe feature; and generating the second set of points on the mesh basedon the request for modification and the set of first points.
 5. Themethod of claim 1, further comprising: identifying an area on theportion of the object in the video, the area corresponding to thefeature of the portion of the object; and transforming the area on theportion of the object the object within the frames of the video togenerate modified frames with at least one modified area.
 6. The methodof claim 5, further comprising: generating a mesh based on one or morecharacteristic points of the portion of the object; generating a firstset of points on the mesh for characteristic points associated with theelement of the portion of the object and identifying the area based onthe first set of points generated on the mesh; and transforming the areaon the portion of the object within the frames of the video based on thefirst set of points on the mesh.
 7. The method of claim 1, furthercomprising: generating a mesh based on one or more characteristicspoints of the portion of the object; generating a grid associated with abackground of the video; and transforming the feature based on the meshand the one or more characteristic points of the portion of the objectwhile maintaining the background of the video based on the grid.
 8. Asystem, comprising: one or more processors; and a non-transitoryprocessor-readable storage medium storing processor executableinstructions that, when executed by the one or more processors, causethe one or more processors to perform operations comprising: detectingat least a portion of an object in frames of a video; obtaining a meanface based on a picture with a fixed size, the mean face being relatedto the object; obtaining a current observation of the object using thevideo; computing an observation error based on a square of a differencebetween a logarithm of a function of the current observation of theobject and the logarithm of the function of the mean face with the fixedsize, the portion of the object being detected based on the observationerror; transforming a feature of the portion of the object within theframes of the video to generate modified frames with a modified feature,the feature associated with an element of the portion of the object andthe feature being transformed in the modified frames within the video inwhich the portion of the object is detected while the video is providedat a computing device and the portion of the object is detected in thevideo; and providing the modified frames including the modified feature.9. The system of claim 8, wherein transforming the feature comprises:generating a mesh based on one or more characteristic points of theportion of the object; and transforming the feature based on the meshand the one or more characteristic points of the portion of the object.10. The system of claim 9, wherein generating the mesh comprises:generating a first set of points on the mesh for characteristic pointsassociated with the element of the portion of the object; generating asecond set of points on the mesh based on the set of first points and amodification to be applied in generating the modified feature; andtransforming the frames of the video based on the second set of pointson the mesh.
 11. The system of claim 10, wherein the operations furthercomprise: receiving a request for modification representing amodification to be applied to the feature; and generating the second setof points on the mesh based on the request for modification and the setof first points.
 12. The system of claim 8, wherein the operationscomprise: identifying an area on the portion of the object in the video,the area corresponding to the feature of the portion of the object; andtransforming the area on the portion of the object within the frames ofthe video to generate modified frames with at least one modified area.13. The method of claim 1, further comprising: initializing a trackingprocess to detect the portion of the object in the video; fromtime-to-time, re-initializing the tracking process to re-detect theportion of the object in the video to continue modifying the feature.14. The method of claim 1, wherein the portion of the object is detectedbased on shape units intensity vector, action units intensity vector anda position vector, the position vector indicating vertical positions ofeyebrows, eyes, nose and mouth, the position vector further indicatingwidths of eyes, mouth and chin, the action units intensity vectorindicating facial movement comprising upper lip raising, lip stretching,brow rising, and brow lowering.
 15. A non-transitory processor-readablestorage medium storing processor executable instructions that, whenexecuted by a processor of a machine, cause the machine to performoperations comprising: detecting at least a portion of an object inframes of a video; obtaining a mean face based on a picture with a fixedsize, the mean face being related to the object; obtaining a currentobservation of the object using the video; computing an observationerror based on a square of a difference between a logarithm of afunction of the current observation of the object and the logarithm ofthe function of the mean face with the fixed size, the portion of theobject being detected based on the observation error; transforming afeature of the portion of the object within the frames of the video togenerate modified frames with a modified feature, the feature associatedwith an element of the portion of the object and the feature beingtransformed in the modified frames within the video in which the portionof the object is detected while the video is provided at a computingdevice and the portion of the object is detected in the video; andproviding the modified frames including the modified feature.
 16. Thenon-transitory processor-readable storage medium of claim 15, whereintransforming the feature comprises: generating a mesh based on one ormore characteristic points of the portion of the object; andtransforming the feature based on the mesh and the one or morecharacteristic points of the portion of the object.
 17. The method ofclaim 1, further comprising: obtaining grayscale values of each pixel ofthe portion of the object in the video; applying a logarithm to thegrayscale values of each pixel of the portion of the object in the videobefore tracking the portion of the object; and tracking the portion ofthe object in the video based on the logarithm of the grayscale values.18. The method of claim 1, wherein the portion of the object is detectedbased on an eye separation distance, a mouth width, a chin width andaction unit information indicating a jaw drop and lip corner depression.19. The method of claim 1, further comprising: generating a plurality ofsquare grids associated with one or more regions; and modifying the oneor more regions based on the plurality of square grids to distort theone or more regions.
 20. The method of claim 1, further comprising:generating a square grid associated with a background of the video; andapplying distortion to the background of the video based on the squaregrid.